Image Matting

Background

The target is separating foreground from background given some user annotation (e.g., trimask, scribble). The prevalent technique alpha matting is to solve $\mathbf{\alpha}$ (primary target), $\mathbf{F}$, $\mathbf{B}$ (subordinate target) in $\mathbf{I}=\mathbf{\alpha}\circ\mathbf{F}+(1-\mathbf{\alpha})\circ \mathbf{B}$ [1] [2] [3].

Datasets

Alphamatting.com Dataset: 25 train images, 8 test images, each has 3 different trimaps: small, large, user. Input: image and trimap.
Composition-1k Dataset: 1000 images and 50 unique foregrounds.
Matting Human Dataset: 34427 images, annotation is not very accurate.
Dinstinctions-646: composed of 646 foreground images
Text matting dataset

Evaluation metrics

quantitative: Sum of Absolute Differences (SAD), Mean Square Error (MSE), Gradient error, Connectivity error.

Methods

Affinity-based [1]: pixel similarity metrics that rely on color similarity or spatial proximity.
Sampling-based [8]: the foreground/background color of unknown pixels can be obtained by sampling the foreground/background color of known pixels.
Learning-based
- With trimap:
  - Encoder-Decoder network [2] is the first end-to-end method for image matting: input image and trimap, output alpha; alpha loss and compositional loss; refine alpha.
  - DeepMattePropNet [4]: use deep learning to approximate affinity-based matting method; compositional loss.
  - AlphaGAN [6]: combine GAN with alpha loss and compositional loss.
  - Learning based sampling [7]
- Without trimap:
  - Light Dense Network (LDN) + Feathering Block (FB) [3]: generate segmentation mask and refine the mask with feathering block.
  - T-Net+M-net [5]: use segmentation task as trimap
  - [9]: capture the background image without subject and a corresponding video with subject

Losses

gradient loss [11] Laplacian loss [12]

Extension

Omnimatte [10]: segment objects and scene effects related to the objects (shadows, reflections, smoke)

User-guided Image Matting

unified interactive image matting: [13]

Reference:

[1] Aksoy, Yagiz, Tunc Ozan Aydin, and Marc Pollefeys. “Designing effective inter-pixel information flow for natural image matting.” CVPR, 2017.

[2] Xu, Ning, et al. “Deep image matting.” CVPR, 2017.

[3] Zhu, Bingke, et al. “Fast deep matting for portrait animation on mobile phone.” ACM MM, 2017.

[4] Wang, Yu, et al. “Deep Propagation Based Image Matting.” IJCAI. 2018.

[5] Quan Chen, Tiezheng Ge, Yanyu Xu, Zhiqiang Zhang, Xinxin Yang, Kun Gai, “Semantic Human Matting.” ACM MM, 2018.

[6] Lutz, Sebastian, Konstantinos Amplianitis, and Aljosa Smolic. “AlphaGAN: Generative adversarial networks for natural image matting.” BMVC, 2018.

[7] Jingwei Tang, Yagız Aksoy, Cengiz Oztireli, Markus Gross, Tunc Ozan Aydın. “Learning-based Sampling for Natural Image Matting”, CVPR, 2019.

[8] Feng, Xiaoxue, Xiaohui Liang, and Zili Zhang. “A cluster sampling method for image matting via sparse coding.” ECCV, 2016.

[9] Soumyadip Sengupta, Vivek Jayaram, Brian Curless, Steve Seitz, Ira Kemelmacher-Shlizerman:
Background Matting: The World is Your Green Screen. CVPR, 2020.

[10] Lu, Erika, et al. “Omnimatte: Associating Objects and Their Effects in Video.” CVPR, 2021.

[11] Zhang, Yunke, et al. “A late fusion cnn for digital matting.” CVPR, 2019.

[12] Hou, Qiqi, and Feng Liu. “Context-aware image matting for simultaneous foreground and alpha estimation.” ICCV. 2019.

[13] Yang, Stephen, et al. “Unified interactive image matting.” arXiv preprint arXiv:2205.08324 (2022).